Improving capacity-performance tradeoffs in the storage tier
نویسنده
چکیده
Villaseñor, Eric Ph.D., Purdue University, May 2015. Improving Capacity-Performance Tradeoffs in the Storage Tier. Major Professor: Mithuna Thottethodi. Data-set sizes are growing. New techniques are emerging to organize and analyze these data-sets. There is a key access pattern emerging with these new techniques, large sequential file accesses. The trend toward bigger files exists to help amortize the cost of data accesses from the storage layer, as many workloads are recognized to be I/O bound. The storage layer is widely recognized as the slowest layer in the system. This work focuses on the tradeoff one can make with that storage capacity to improve system performance. Capacity can be leveraged for improved availability or improved performance. This tradeoff is key in the storage layer, as this allows for data loss prevention and bandwidth aggregation. Typically these tradeoffs do not allow much choice with regard to capacity use. This work will leverage replication as the enabling mechanism to improve the capacity-performance tradeoff in the storage tier, while still providing for availability. This capacity-performance tradeoff can be made at both the local and distributed file system level. I propose two techniques that allow for an improved tradeoff of capacity. The local file system can be employed on scale-out or scale-up infrastructures to improve performance. The distributed file system is targeted at distributed frameworks, such as MapReduce, to improve the cluster performance. The local file system design is MorphStore, and the distributed file system is BoostDFS. MorphStore is a file system that significantly improves performance when accessing large files by using two innovations. MorphStore combines (a) load-adaptive I/O access scheduling to dynamically optimize throughput (aggregation), and (b) utility-
منابع مشابه
Improving Data Grids Performance by Using Modified Dynamic Hierarchical Replication Strategy
Abstract: A Data Grid connects a collection of geographically distributed computational and storage resources that enables users to share data and other resources. Data replication, a technique much discussed by Data Grid researchers in recent years creates multiple copies of file and places them in various locations to shorten file access times. In this paper, a dynamic data replication strate...
متن کاملCheap Data Analytics using Cold Storage Devices
Enterprise databases use storage tiering to lower capital and operational expenses. In such a setting, data waterfalls from an SSDbased high-performance tier when it is “hot” (frequently accessed) to a disk-based capacity tier and finally to a tape-based archival tier when “cold” (rarely accessed). To address the unprecedented growth in the amount of cold data, hardware vendors introduced new d...
متن کاملA Recovery Conscious Framework for Fault Resilient Storage Systems
This paper presents a recovery-conscious framework for improving the fault resiliency and recovery efficiency of highly concurrent embedded storage software systems. Our framework consists of a three-tier architecture and a suite of recovery conscious techniques. In the top tier, we promote the fine-grained recovery at the task level by introducing recovery scopes to model recovery dependencies...
متن کاملPower-aware Proactive Storage-tiering Management for High-speed Tiered-storage Systems
Large-scale high-speed mass-storage systems account for a large part of the energy consumed at data centers. To conserve energy consumed by these storage systems, we propose a high-speed tiered-storage system with a poweraware proactive method of storage-tiering management that minimizes loss of performance, which we have called the energy-efficient High-speed Tiered-Storage system (eHiTS). eHi...
متن کاملDynamic Management of Caching Tiers
Application owners are typically concerned about getting the best performance at the lowest cost. When the workload demand for an application is dynamic, lowering the cost necessitates dynamic management of the hosting infrastructure (physical servers or virtual machines). An application deployment is often divided into multiple tiers. The frontend tier is usually stateless, and processes incom...
متن کامل